1,456 research outputs found

    Incentivizing Exploration with Heterogeneous Value of Money

    Full text link
    Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent's tradeoff via a signal that could be more or less informative. Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called "Diamonds in the Rough" instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances. These results also extend to the budgeted case as in Frazier et al. We also show that the optimal policy is monotone with respect to information, i.e., the approximation ratio of the optimal policy improves as the signals become more informative.Comment: WINE 201

    On the Prior Sensitivity of Thompson Sampling

    Full text link
    The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with pp being the prior probability mass of the true reward-generating model, we prove O(T/p)O(\sqrt{T/p}) and O((1−p)T)O(\sqrt{(1-p)T}) regret upper bounds for the bad- and good-prior cases, respectively, as well as \emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning Theory (ALT), 201

    The determinants of longevity: The perspectives from East Asian economies

    Get PDF

    Bandit Models of Human Behavior: Reward Processing in Mental Disorders

    Full text link
    Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

    Limit theorems for delayed sums of random sequence

    Get PDF

    Understanding the Spatial Clustering of Severe Acute Respiratory Syndrome (SARS) in Hong Kong

    Get PDF
    We applied cartographic and geostatistical methods in analyzing the patterns of disease spread during the 2003 severe acute respiratory syndrome (SARS) outbreak in Hong Kong using geographic information system (GIS) technology. We analyzed an integrated database that contained clinical and personal details on all 1,755 patients confirmed to have SARS from 15 February to 22 June 2003. Elementary mapping of disease occurrences in space and time simultaneously revealed the geographic extent of spread throughout the territory. Statistical surfaces created by the kernel method confirmed that SARS cases were highly clustered and identified distinct disease “hot spots.” Contextual analysis of mean and standard deviation of different density classes indicated that the period from day 1 (18 February) through day 16 (6 March) was the prodrome of the epidemic, whereas days 86 (15 May) to 106 (4 June) marked the declining phase of the outbreak. Origin-and-destination plots showed the directional bias and radius of spread of superspreading events. Integration of GIS technology into routine field epidemiologic surveillance can offer a real-time quantitative method for identifying and tracking the geospatial spread of infectious diseases, as our experience with SARS has demonstrated

    Molecular Identification of Spirometra erinaceieuropaei Tapeworm in Cases of Human Sparganosis, Hong Kong

    Get PDF
    Human sparganosis is a foodborne zoonosis endemic in Asia. We report a series of 9 histologically confirmed human sparganosis cases in Hong Kong, China. All parasites were retrospectively identified as Spirometra erinaceieuropaei. Skin and soft tissue swelling was the most common symptom, followed by central nervous system lesions.published_or_final_versio

    Treatment of severe acute respiratory syndrome with lopinavir/ritonavir: A multicentre retrospective matched cohort study

    Get PDF
    Objectives. To investigate the possible benefits and adverse effects of the addition of lopinavir/ritonavir to a standard treatment protocol for the treatment of severe acute respiratory syndrome. Design. Retrospective matched cohort study. Setting. Four acute regional hospitals in Hong Kong. Patients and methods. Seventy-five patients with severe acute respiratory syndrome treated with lopinavir/ritonavir in addition to a standard treatment protocol adopted by the Hospital Authority were matched with controls retrieved from the Hospital Authority severe acute respiratory syndrome central database. Matching was done with respect to age, sex, the presence of co-morbidities, lactate dehydrogenase level and the use of pulse steroid therapy. The 75 patients treated with lopinavir/ritonavir were divided into two subgroups for analysis: lopinavir/ritonavir as initial treatment, and lopinavir/ritonavir as rescue therapy. These groups were compared with matched cohorts of 634 and 343 patients, respectively. Outcomes including overall death rate, oxygen desaturation, intubation rate, and use of pulse methylprednisolone were reviewed. Results. The addition of lopinavir/ritonavir as initial treatment was associated with a reduction in the overall death rate (2.3%) and intubation rate (0%), when compared with a matched cohort who received standard treatment (15.6% and 11.0% respectively, P<0.05) and a lower rate of use of methylprednisolone at a lower mean dose. The subgroup who had received lopinavir/ritonavir as rescue therapy, showed no difference in overall death rate and rates of oxygen desaturation and intubation compared with the matched cohort, and received a higher mean dose of methylprednisolone. Conclusion. The addition of lopinavir/ritonavir to a standard treatment protocol as an initial treatment for severe acute respiratory syndrome appeared to be associated with improved clinical outcome. A randomised double-blind placebo-controlled trial is recommended during future epidemics to further evaluate this treatment.published_or_final_versio

    Prospective modelling of environmental dynamics. A methodological comparison applied to mountain land cover changes

    Get PDF
    During the last 10 years, scientists performed significant advances in modelling environmental dynamics. A wide range of new methodological approaches in geomatics - such as neural networks, multi-agent systems or fuzzy logics - was developed. Despite these progresses, the modelling softwares available have to be considered as experimental tools rather than as improved procedures able to work for environmental management or decision support. Particularly, the authors consider that a large number of publications suffer from lakes in the validation of the model results. This contribution describes three different modelling approaches applied to prospective land cover prediction. The first one, a combined geomatic method, uses Markov chains for temporal transition prediction while their spatial assignment is supervised manually by the construction of suitability maps. Compared to this directed method, the two others may be considered as semi automatic because both the polychotomous regression and the multilayer perceptron only need to be optimized during a training step - the algorithms detect themselves the spatial-temporal changes in land cover. The authors describe the three methodological approaches and their practical applications to two mountain studied areas: one in French Pyrenees, the second including a large part of Sierra Nevada, Spain. The article focuses on the comparison of results. The main result is that prediction scores are on the more high that land cover is persistent. They also underline that the geomatic model is complementary to the statistical ones which perform higher overall prediction rate but produce worse simulations when land cover changes are numerous
    • 

    corecore